Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication
نویسندگان
چکیده
We develop a method for improving the parallel scalability of the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv method, we compute selected elements of the inverse of a sparse matrix A that can be decomposed as A = LU , where L is lower triangular and U is upper triangular. Updating these selected elements of A−1 requires restricted collective communications among a subset of processors within each column or row communication group created by a block cyclic distribution of L and U . We describe how this type of restricted collective communication can be implemented by using asynchronous point-to-point MPI communication functions combined with a binary tree based data propagation scheme. Because multiple restricted collective communications may take place at the same time in the parallel selected inversion algorithm, we need to use a heuristic to prevent processors participating in multiple collective communications from receiving too many messages. This heuristic allows us to reduce communication load imbalance and improve the overall scalability of the selected inversion algorithm. For instance, when 6, 400 processors are used, we observe over 5x speedup for test matrices. It also mitigates the performance variability introduced by an inhomogeneous network topology.
منابع مشابه
PSelInv - A Distributed Memory Parallel Algorithm for Selected Inversion: the non-symmetric Case
This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse nonsymmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L,U are lower and upper triangular matrices, and P,Q are permutation matrices, respectively. The PSelInv method computes selected elements of A. The selection is confin...
متن کاملA Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations
An efficient parallel algorithm is presented and tested for computing selected components of H−1 where H has the structure of a Hamiltonian matrix of two-dimensional lattice models with local interaction. Calculations of this type are useful for several applications, including electronic structure analysis of materials in which the diagonal elements of the Green’s functions are needed. The algo...
متن کاملBetter Algorithms for Parallel Backtracking
Many algorithms in operations research and artiicial intelligence are based on the backtracking principle, i.e., depth rst search in implicitly deened trees. For parallelizing these algorithms, a load balancing scheme is needed which is able to evenly distribute the parts of an irregularly shaped tree over the processors. It should work with minimal interprocessor communication and without prio...
متن کاملBalancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms
Stochastic Gradient Descent (SGD) is the standard numerical method used to solve the core optimization problem for the vast majority of machine learning (ML) algorithms. In the context of large scale learning, as utilized by many Big Data applications, efficient parallelization of SGD is in the focus of active research. Recently, we were able to show that the asynchronous communication paradigm...
متن کاملParallel search algorithm for the detection of irregular structures
Search algorithms for the Detection of Irregular Structures are extremely diicult to parallelize eeciently, due to their non-local nature that makes load balancing a major problem. In this work some algorithms are investigated and implemented for Multiple Cluster and Single Cluster Search problems. A new (asynchronous) approach for the Single Cluster Search problem is also presented, giving a h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1504.04714 شماره
صفحات -
تاریخ انتشار 2015